Goto

Collaborating Authors

 online environment


Adapting to Online Label Shift with Provable Guarantees

Neural Information Processing Systems

The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not. The non-stationarity nature and the lack of supervision make the problem challenging to be tackled. To address the difficulty, we construct a new unbiased risk estimator that utilizes the unlabeled data, which exhibits many benign properties albeit with potential non-convexity. Building upon that, we propose novel online ensemble algorithms to deal with the non-stationarity of the environments. Our approach enjoys optimal \emph{dynamic regret}, indicating that the performance is competitive with a clairvoyant who knows the online environments in hindsight and then chooses the best decision for each round. The obtained dynamic regret bound scales with the intensity and pattern of label distribution shift, hence exhibiting the adaptivity in the OLaS problem. Extensive experiments are conducted to validate the effectiveness and support our theoretical findings.


Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

arXiv.org Artificial Intelligence

Recent efforts to leverage the Multi-modal Large Language Model (MLLM) as GUI agents have yielded promising outcomes. However, these agents still struggle with long-horizon tasks in online environments, primarily due to insufficient knowledge and the inherent gap between offline and online domains. In this paper, inspired by how humans generalize knowledge in open-ended environments, we propose a Hierarchical Multimodal Skills (HMS) module to tackle the issue of insufficient knowledge. It progressively abstracts trajectories into execution skills, core skills, and ultimately meta-skills, providing a hierarchical knowledge structure for long-horizon task planning. To bridge the domain gap, we propose the Skill-Augmented Monte Carlo Tree Search (SA-MCTS) algorithm, which efficiently leverages skills acquired in offline environments to reduce the action search space during online tree exploration. Building on HMS, we propose Mirage-1, a multimodal, cross-platform, plug-and-play GUI agent. To validate the performance of Mirage-1 in real-world long-horizon scenarios, we constructed a new benchmark, AndroidLH. Experimental results show that Mirage-1 outperforms previous agents by 32\%, 19\%, 15\%, and 79\% on AndroidWorld, MobileMiniWob++, Mind2Web-Live, and AndroidLH, respectively. Project page: https://cybertronagent.github.io/Mirage-1.github.io/


Skill-based Safe Reinforcement Learning with Risk Planning

arXiv.org Artificial Intelligence

Safe Reinforcement Learning (Safe RL) aims to ensure safety when an RL agent conducts learning by interacting with real-world environments where improper actions can induce high costs or lead to severe consequences. In this paper, we propose a novel Safe Skill Planning (SSkP) approach to enhance effective safe RL by exploiting auxiliary offline demonstration data. SSkP involves a two-stage process. First, we employ PU learning to learn a skill risk predictor from the offline demonstration data. Then, based on the learned skill risk predictor, we develop a novel risk planning process to enhance online safe RL and learn a risk-averse safe policy efficiently through interactions with the online RL environment, while simultaneously adapting the skill risk predictor to the environment. We conduct experiments in several benchmark robotic simulation environments. The experimental results demonstrate that the proposed approach consistently outperforms previous state-of-the-art safe RL methods.


Adapting to Online Label Shift with Provable Guarantees

Neural Information Processing Systems

The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not. The non-stationarity nature and the lack of supervision make the problem challenging to be tackled. To address the difficulty, we construct a new unbiased risk estimator that utilizes the unlabeled data, which exhibits many benign properties albeit with potential non-convexity.


AI incidents and 'networked trouble': The case for a research agenda

arXiv.org Artificial Intelligence

Against a backdrop of widespread interest in how publics can participate in the design of AI, I argue for a research agenda focused on AI incidents - examples of AI going wrong and sparking controversy - and how they are constructed in online environments. I take up the example of an AI incident from September 2020, when a Twitter user created a 'horrible experiment' to demonstrate the racist bias of Twitter's algorithm for cropping images. This resulted in Twitter not only abandoning its use of that algorithm, but also disavowing its decision to use any algorithm for the task. I argue that AI incidents like this are a significant means for participating in AI systems that require further research. That research agenda, I argue, should focus on how incidents are constructed through networked online behaviours that I refer to as 'networked trouble', where formats for participation enable individuals and algorithms to interact in ways that others - including technology companies - come to know and come to care about. At stake, I argue, is an important mechanism for participating in the design and deployment of AI.


Deepfake AI-generated people will sow chaos by 2024 as they would impossible to detect, warns ex-White House chief

#artificialintelligence

DEEPFAKE AI-generated people will be among us by 2024 and will be nearly impossible to detect, a former White House official has warned. Pictures created by artificial intelligence, increasingly smart chatbots and sophisticated deepfake videos are already becoming hard to discern from reality. The technology is only going to become more advanced - with rapid developments already smoothing out the edges and finessing the programmes. Red flags are already being raised - as some imagery created by AI can already be almost indistinguishable from the real thing apart from a few telltale inconsistencies. The pictures at the top of this article are near perfect recreations of people's faces, created using the AI driven system Generated.Photos.


Gender bias in AI recruiting: How algorithms hold women back

#artificialintelligence

The odds are still stacked against women's success in the workplace, and artificial intelligence (AI) is only making it worse, a new report released on Tuesday claims. Because algorithms used in human resources systems are built on historical data reflecting past bias against women in the workplace, they tend to disadvantage women throughout their careers, according to the study, published on International Women's Day in a joint collaboration between UNESCO, the OECD and the Inter-American Development Bank. Here's a look at how AI bias in the workplace happens - and how it might be tackled. Workers increasingly find new opportunities through online jobs platforms, such as Indeed and LinkedIn and on social media like Facebook and Twitter. The algorithms on these platforms influence which job opportunities people learn about, and how well-suited they perceive themselves to be for a particular role.


On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

arXiv.org Artificial Intelligence

We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We begin by defining the problem of learning from confounded expert data in a contextual MDP setup. We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup. We then discuss the problem of distribution shift between the expert data and the online environment when the data is only partially observable. We prove possibility and impossibility results for imitation learning under arbitrary distribution shift of the missing covariates. When additional external reward is provided, we propose a sampling procedure that addresses the unknown shift and prove convergence to an optimal solution. Finally, we validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.


Reality Regained: An Inquiry into the Data Age

#artificialintelligence

Advances in computing power, methods of computing and machine learning all hugely expand the type of tasks that can be addressed and successfully resolved by machines. At the same time, the unprecedented uptake of lightweight technologies, the diffusion of digital platforms and social media, grant the current online environment a social dimension that was only vaguely present in the early internet. Taken together, these developments establish a cultural context that increasingly quantifies daily pursuits and induces the framing of ordinary life issues in terms of data and whatever relations can be inferred out of the crunching of large data volumes across lay and expert cultures. This is, on many counts, an epochal transformation through which the marks of a digital culture (data and data relations) crowd out the immediate reality of personal experience, experiential knowledge and situated interaction. Even though depersonalization and the diffusion of formal methods of living and knowing have been intrinsic to modernity, the current developments differ in some important respects that are worthy of being observed and analyzed.


Online SPARC for Drawing and Animation

AAAI Conferences

We developed a method to draw and animate using SPARC, a logic programming system, and an online environment to support this method.Particularly, we introduce two predicates: one for drawing and one for animation. By our method, programmers will write a SPARC program, using our introduced predicates, to specify their drawing or animation. The drawing or animation will then be rendered upon executing the program with our system. In fact, our online system provides an environment where the programmers can easily edit and execute their programs.